Keyword [LGRAN]

Wang P, Wu Q, Cao J, et al. Neighbourhood watch: Referring expression comprehension via language-guided graph attention networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 1960-1968.

1. Overview

1.1. Motivation

Mattnet. language and region features are learned or designed independently without being informed by each other.

In this paper, it proposes Language-guided Graph Attention Network (LGRAN)
1) Language Self-attention Module
2) Language-guided Graph Attention (node attention & edge attention)

Make referring expression decision both visualisable and explainable.

directed graph
node. object set of proposals or GTs
edge
- intra-class edge. spatial relationship
- inter-class edge. spatial relationship + other objects’ visual feature

1.2. Dataset

RefCOCO
RefCOCO+
RefCOCOg

2. LGRAN

2.1. Problem

1) Given an image $I$, localise the object $o’$ referred to by r from the object set $O={o_i}, i=1,…,N$ of $I$.
2) $O$ is given as GT or obtained by an proposal generation method.